Performance Evaluation of Cache Depot on CC-NUMA Multiprocessors
نویسندگان
چکیده
Cache depot is a performance enhancement technique on cache-coherent non-uniform memory access (CCNUMA) multiprocessors, in which nodes in the system store extra memory blocks on behalf of other nodes. In this way, memory requests from a node can be satisfied by nearby depot nodes without going all the way to the home node. This not only reduces memory access latency and network traffic, but also spreads the network load more evenly. In this paper, we study the design strategy for cache depot that (1) enhances the network interface of each node to include a depot cache which stores those extra memory blocks for other nodes, and (2) employs a new multicast routing scheme, which is called the multi-hop worms and works cooperatively with depot caches, to transmit coherence messages. By considering message routing and depot caches together, the design concept can be applied even to those CC-NUMA systems that have a non-hierarchical, scalable interconnection network. We have developed an executiondriven simulator to evaluate the effectiveness of the design strategy. Performance results from using four SPLASH-2 benchmarks show that the design strategy improves the performance of the CC-NUMA multiprocessor by 11% to 21%. We have also studied in depth various factors which affect the performance of cache depot.
منابع مشابه
Design and Evaluation of a Switch Cache Architecture for CC-NUMA Multiprocessors
ÐCache coherent nonuniform memory access (CC-NUMA) multiprocessors provide a scalable design for shared memory. But, they continue to suffer from large remote memory access latencies due to comparatively slow memory technology and large data transfer latencies in the interconnection network. In this paper, we propose a novel hardware caching technique, called switch cache, to improve the remote...
متن کاملDesign and Evaluation of a Switch
Cache coherent non-uniform memory access (CC-NUMA) multiprocessors provide a scal-able design for shared memory but they continue to suuer from large remote memory access latencies due to comparatively slow memory technology and data transfer latencies in the in-terconnection network. In this paper, we propose a novel hardware caching technique, called switch cache, to improve the remote memory...
متن کاملExcel-NUMA: Toward Programmability, Simplicity, and High Performance
ÐWhile hardware-coherent scalable shared-memory multiprocessors are relatively easy to program, they still require substantial programming effort to deliver high performance. Specifically, to minimize remote accesses, data must be carefully laid out in memory for locality and application working sets carefully tuned for caches. It has been claimed that this programming effort is less necessary ...
متن کاملPerformance Evaluation of Memory Allocation Schemes on CC-NUMA Multiprocessors
{ Cache Coherent Non-Uniform Memory Access (CC-NUMA) architectures have received strong interests from both academia and industries. This paper studies the performance impact of design choices at diierent levels of address and memory mapping on CC-NUMA architectures. Through execution-driven simulations of ve numerical programs, we nd close interactions between data allocation, global address t...
متن کاملSwitch Cache: A Framework for Improving the Remote Memory Access Latency of CC-NUMA Multiprocessors
Cache coherentnon-uniform memory access (CC-NUMA) multiprocessors continue to suffer from remote memory access latencies due to comparatively slow memory technology and data transfer latencies in the interconnection network. In this paper, we propose a novel hardware caching technique, called switch cache. The main idea is to implement small fast caches in crossbar switches of the interconnect ...
متن کامل